[Day26] - Turtle Island套件介紹（1） - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

2025 iThome 鐵人賽

DAY 26

Software Development

Polars熊霸天下系列第 26 篇

[Day26] - Turtle Island套件介紹（1）

17th鐵人賽 python polars turtle island

Jerry Wu

2025-10-02 08:58:47

83 瀏覽

分享至

Turtle Island（註1）為小弟編寫的套件，其目的是希望能讓使用者專注地編寫expr，而非處理相關的boilerplate code。

今天我將分享Turtle Island的核心精神及其特點（註2）。

本日大綱如下：

本日引入模組及準備工作
安裝
核心思維
2.1 pl.DataFrame級別操作
2.2 pl.Expr級別操作
Bonus

0. 本日引入模組及準備工作

import polars as pl


df = pl.DataFrame(
    {
        "col1": [1, 2, 3, 4, 5],
        "col2": [6, 7, 8, 9, 10],
        "col3": [11, 12, 13, 14, 15],
    }
)

shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ 6    ┆ 11   │
│ 2    ┆ 7    ┆ 12   │
│ 3    ┆ 8    ┆ 13   │
│ 4    ┆ 9    ┆ 14   │
│ 5    ┆ 10   ┆ 15   │
└──────┴──────┴──────┘

df2 = pl.DataFrame(
    {
        "x": [[1, 2, 3], [4, 5, 6]],
        "y": [[7, 8, 9], [10, 11, 12]],
    }
)

shape: (2, 2)
┌───────────┬──────────────┐
│ x         ┆ y            │
│ ---       ┆ ---          │
│ list[i64] ┆ list[i64]    │
╞═══════════╪══════════════╡
│ [1, 2, 3] ┆ [7, 8, 9]    │
│ [4, 5, 6] ┆ [10, 11, 12] │
└───────────┴──────────────┘

1. 安裝

目前Turtle Island尚未上架PyPI，使用者必須由GitHub repo安裝，例如：

pip install git+https://github.com/jrycw/turtle-island.git

或使用uv：

uv add git+https://github.com/jrycw/turtle-island.git

為避免命名衝突，建議使用下列方式引入Turtle Island：

import turtle_island as ti

2. 核心思維

Turtle Island的核心思維是，盡量使用pl.Expr而非pl.Dataframe級別的函數進行操作。

舉例來說，針對下面要求，使用df比較pl.DataFrame及pl.Expr之操作：

當奇數行時（即第一、第三行等），「"col1"」及「"col2"」列之值維持不變。
當偶數行時（即第二、第四行等），「"col1"」及「"col2"」列之值將自「"col1"」列中取得。

2.1 `pl.DataFrame`級別操作

pl.DataFrame級別的操作，必須依次進行下面兩個步驟：

使用pl.DataFrame.with_row_index()新增「"index"」列作為索引（註3）。
在.with_columns()context中，使用when-then-otherwise編寫mod運算的取值邏輯。

(
    df.with_row_index().with_columns(
        pl.when(pl.col("index").mod(2).eq(0))
        .then(pl.col("col1", "col2"))
        .otherwise("col3"),
    )
)

shape: (5, 4)
┌───────┬──────┬──────┬──────┐
│ index ┆ col1 ┆ col2 ┆ col3 │
│ ---   ┆ ---  ┆ ---  ┆ ---  │
│ u32   ┆ i64  ┆ i64  ┆ i64  │
╞═══════╪══════╪══════╪══════╡
│ 0     ┆ 1    ┆ 6    ┆ 11   │
│ 1     ┆ 12   ┆ 12   ┆ 12   │
│ 2     ┆ 3    ┆ 8    ┆ 13   │
│ 3     ┆ 14   ┆ 14   ┆ 14   │
│ 4     ┆ 5    ┆ 10   ┆ 15   │
└───────┴──────┴──────┴──────┘

「"index"」列並不是我們感興趣的部份，但卻必須先新增它，才能進行後續mod運算。

2.2 `pl.Expr`級別操作

ti.is_every_nth_row()會依據輸入值判斷其是否為第n行的倍數，返回True或False，並可直接於context中使用，例如：

df.with_columns(
    ti.case_when(
        case_list=[(ti.is_every_nth_row(2), pl.col("col1", "col2"))],
        otherwise="col3",
    )
)

shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ 6    ┆ 11   │
│ 12   ┆ 12   ┆ 12   │
│ 3    ┆ 8    ┆ 13   │
│ 14   ┆ 14   ┆ 14   │
│ 5    ┆ 10   ┆ 15   │
└──────┴──────┴──────┘

其中，ti.case_when()是Turtle Island針對when-then-otherwise所提供的語法糖。

Turtle Island在不新增「"index"」列的情況下，完成所需運算。

3. Bonus

由於Turtle Island提供的函數大多數會返回expr，這將使得這些函數也能在.list.eval()中使用。

以ti.cycle()為例，可以將所選expr的元素往下一行，並將最後一個元素「擠」至最前，例如：

df.select(df.select(ti.cycle(pl.all())))

shape: (5, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  │
╞══════╪══════╪══════╡
│ 5    ┆ 10   ┆ 15   │
│ 1    ┆ 6    ┆ 11   │
│ 2    ┆ 7    ┆ 12   │
│ 3    ┆ 8    ┆ 13   │
│ 4    ┆ 9    ┆ 14   │
└──────┴──────┴──────┘

如果在.list.eval()中執行，也可以得到如預期般向下「擠」的結果，例如：

df2.with_columns(pl.all().list.eval(ti.cycle(pl.element(), 2)))

shape: (2, 2)
┌───────────┬──────────────┐
│ x         ┆ y            │
│ ---       ┆ ---          │
│ list[i64] ┆ list[i64]    │
╞═══════════╪══════════════╡
│ [2, 3, 1] ┆ [8, 9, 7]    │
│ [5, 6, 4] ┆ [11, 12, 10] │
└───────────┴──────────────┘

此處展示了ti.cycle()向下「擠」兩個元素的結果。